-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[New Workflow] AMR-search for neisseria gonorrhoeae samples #743
base: main
Are you sure you want to change the base?
Conversation
…t for later integration
…made input_base more robust
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial changes here @awh082834. Mostly minor documentation/namespace polishing. Well done with the overarching schema. We'll see what the UAT brings about for feedback.
workflow amr_search_workflow { | ||
input { | ||
File input_fasta | ||
String amr_search_database |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider changing this as the user doesn't need to pass in a DB, only the taxon code that then references the correct DB to use. Taxon/taxon_of_interest/taxon_code might make more sense. Some of this will depend on how we do the mapping when we plug into TheiaProk, but just dropping this as a note for future work when we implement that integration upstream.
|
||
## AMR_Search_PHB | ||
|
||
The AMR_Search workflow is a standalone version of Pathogenwatchs AMR profiling functionality utilizing `AMRsearch` tool from Pathogenwatch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pathogenwatch's
| amr_search | **disk_size** | Integer | Amount of storage (in GB) to allocate to the task |50| Optional | | ||
| amr_search | **docker** | String | The docker container to use for the task |us-docker.pkg.dev/general-theiagen/theiagen/amrsearch:0.2.0| Optional | | ||
| amr_search | **memory** | Integer | Amount of memory/RAM (in GB) to allocate to the task |8| Optional | | ||
| amr_search_workflow | **amr_search_database** | String | NCBI taxon code of samples known taxonomy, see above supported species || Required | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move up so required inputs are listed first
|
||
This task performs *in silico* antimicrobial resistance (AMR) profiling for *Neisseria gonorrhoeae* using **AMRsearch**, the primary tool used by [Pathogenwatch](https://pathogen.watch/) to genotype and infer antimicrobial resistance (AMR) phenotypes from assembled microbial genomes. | ||
|
||
**AMRsearch** screens against an in-house library of curated genotypes and inferred phenotypes, developed in collaboration with community experts. Resistance phenotypes are determined based on both **resistance genes** and **mutations**, and the system accounts for interactions between multiple SNPs, genes, and suppressors. Predictions follow **S/I/R classification** (*Sensitive, Intermediate, Resistant*). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't Theiagen's "in-house library" so we probably want to adjust this. I am guessing this is a copy/paste from PW description of AMR_search (totally fine), but we'll want to have it reflect Theiagen, not PW. I.e. "Screens against Pathogenwatch's library of curated genotypes..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes this was a holdover from original documentation that I decided to keep. In context it definitely sounds like its our library. Ill get this changed!
| Software Documentation | [Pathogenwatch](https://cgps.gitbook.io/pathogenwatch) | | ||
| Original Publication(s) | [PAARSNP: *rapid genotypic resistance prediction for *Neisseria gonorrhoeae*](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7545138/) | | ||
|
||
!!! techdetails "`parse_amr_json.wdl` Details" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
| amr_search_docker | String | Docker image used to run AMR_Search | | ||
| amr_search_version | String | Version of AMR_Search libraries used | | ||
|
||
## References (if applicable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove "(if applicable)"
} | ||
command <<< | ||
# Extract base name without path or extension | ||
# Added suffix strip to handle cases of differing FASTA extensions. Was hard coded to .fasta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Strip suffix to handle cases of differing FASTA extension."
we need not describe what was added/what previously existed, only comment the existing code for functional understanding/comprehension
# Move the output file from the input directory to the working directory | ||
mv $(dirname ~{input_fasta})/${input_base}_paarsnp.jsn ./~{samplename}_paarsnp_results.jsn | ||
|
||
python3 /scripts/parse_amr_json.py \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe worth a comment here with a link to the location of this script, for posterity's sake
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! Ill put the link to the current dev branch of the docker builds repo and will update it when it gets merged.
🗑️ This dev branch should be deleted after merging to main.
🧠 Summary
This PR creates a standalone workflow for PathogenWatch AMR-search in order to utilize the functionality of its AMR resistance profiling steps. This is processed by an integrated Python script
parse_amr_json
to extract relevant data into a CSV file and PNG summary table for visualization. This workflow will likely be integrated into TheiaProk later down the road.Documentation for this new workflow has been created.
⚡ Impacted Workflows/Tasks
This PR may lead to different results in pre-existing outputs: No
This PR uses an element that could cause duplicate runs to have different results: No
🛠️ Changes
Implementation of wf_amr_search.wdl along with task_amr_search.wdl
This includes the building of a new docker container with PathogenWatch AMR-search and
PAARSNP
installed.⚙️ Algorithm
Using a microbial FASTA file, PAARSNP is run and generates a JSON file containing AMR profiling information. This JSON is then passed to a python script
parse_amr_json.py
which is housed within the docker containerus-docker.pkg.dev/general-theiagen/theiagen/amrsearch:0.2.0
. This script then parses the information within the JSON and creates a CSV and PNG that resemble the output given from Pathogenwatch's AMR profile.➡️ Inputs
⬅️ Outputs
🧪 Testing
Initial Terra Test
Test Containing All Species
E. coli were not included in this test as there were no publicly available examples available in Pathogenwatch. PAARSNP/AMRSearch was not run prior to a certain date.
GCA_011383385_typhi: Newer database of 0.0.20 has additional sul2 predicted
GCA_042331435_GC: Newer database of 0.0.20 has additional tetM and rpsj_V57M predicted. Inferred resistance of tetracycline from intermediate to resistant.
Suggested Scenarios for Reviewer to Test
wf_amr_search.wdl provides the correct outputs, PNG, JSON, and CSV.
If Pathogenwatch was being used previously, run against existing results.
🔬 Final Developer Checklist
workflows_overview
tables to be the tag for the next upcoming release. If you do not know the tag, please put "vX.X.X"🎯 Reviewer Checklist